Creating reproducible reports

A workshop in reproducible research for the R novice


Session 1



Richard Layton

Department of Mechanical Engineering
Rose-Hulman Institute of Technology
2016-08-24

Welcome


How complete is your homework? Find a partner at the same level

  • complete
  • partially complete
  • we had homework?


Get connected

Getting started


Introductions
Handouts


Write down your ideas in response to Mystery question 1:


What is reproducible research?

Practitioners tell us:


Research is reproducible when the data and the code used to obtain a finding are available and sufficient for an independent researcher to recreate the finding.


  • computational, data-intensive

  • spans the full data, analysis, & publication workflow

  • most of us have received only perfunctory training (if any)


Victoria Stodden, F. Leisch, & R. Peng, ed., Implementing Reproducible Research, CRC Press, 2014.
Christopher Gandrud, Reproducible Research with R and RStudio, 2/e, CRC Press, 2015.

Events tell us:


More accountability is needed because of

  • data falsification
  • erroneous analysis
  • misleading presentation of results


Karen EC Levy & David Merritt Johns, When open data is a Trojan Horse: The weaponization of transparency in science and governance, Big Data and Society, 2016.

Attempts to reproduce this work revealed . . .

the primary findings were false. The major effect disappeared after correcting for

  • coding errors

  • selective exclusion of available data

  • unconventional weighting of summary statistics


Kenneth Rogoff & Carmen Reinhart


Thomas Herdon, Michael Ash, & Robert Pollin, Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff, Political Economy Research Institute, U Mass Amherst, 2013.

Attempts to reproduce this work revealed . . .

data were falsified to obtain the research outcomes he wanted, resulting in

  • retracted journal articles (11 to date)

  • terminated clinical trials

  • cancelled research funding

  • civil suit by patients


Anil Potti


Jason deBruyn, Trial involving disgraced scientist and bunk Duke research to begin Monday., Triangle Business Journal, 2015-01-23.
Ivan Oransky, It’s official: Anil Potti faked cancer research data, say Feds, Retraction Watch, 2015-11-07.

However, open science has also been “weaponized”

Scientists and skeptics are in a knife fight, and you don’t bring data to a knife fight.
— Paul Erlich

Why should I make the data available to you, when your aim is to try and find something wrong with it?
— Phil Jones


1000 years of temperature variation: the ”hockey stick” graph by Michael Mann


Freed Pearce, Climate change debate overheated after sceptic grasped ‘hockey stick’, The Guardian, 2010-02-09.
Brad Keyes, Mann retirement: Analysis, reax, Climate Sceptic, 2016-05-08.
Jeff Leek, De-weaponizing reproducibility, 2015-03-13.

The primary benficiary is you

If you do anything “by hand”" once, you’ll do it 100 times.

— Paul Wilson, UW–Madison

Your closest collaborator is you, six months ago. Have you tried to email that slacker?

— Karl Broman, UW–Madison

To preserve sanity, stop collaborating via email, attachments, and tracking changes in Word.

— Jenny Bryan, UBC

Steps you can take towards reproducibility

  • Write scripts (avoid manual copy, paste, mouse-clicks)

  • Plan the organization and naming scheme for files

  • Strive for simplicity, readability, reusability, and testability

  • Agree on a workflow for collaborating before starting a manuscript

  • DRY (don’t repeat yourself)

  • Link files explicitly

  • Plan data management

  • Postpone optimization

  • Use version control

  • License your software


Karl Broman, Initial steps toward reproducible research.
Jenny Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, Tracy Teal, and Greg Wilson, Good enough practices for scientific computing, 2016-01.

Steps you can take towards reproducibility today

  • Write scripts (avoid manual copy, paste, mouse-clicks)

  • Plan the organization and naming scheme for files

  • DRY (don’t repeat yourself)

  • Link files explicitly

Learning objectives


  • Describe the problems that reproducibility helps solve

  • Identify non-reproducible practices in their current workflow

  • List two basic principles of reproducible research

  • Organize directories and files for reproducibility

  • Create a reproducible report using R and RStudio

Consider a sample report


Imagine that you were the author of the “Load cell calibration report”


Carefully review the report and answer Mystery question 2:


Identify as many “manual operations”
as possible.

Agenda for the remainder of this session

  • tutorials with two 15 minute breaks
  • session concludes with Mystery Question 3


Tutorials to create a dynamic report

  • pre-workshop homework (if incomplete)
  • organize your files
  • start your first script
  • explore the data
  • tidy the data
  • create the calibration graph
  • perform a linear regression
  • write the client report

Session 1 wrap-up


Homework:

  1. Continue the tutorials as far as you wish

  2. If you want to start your own reproducible project

    • complete as much of the self-paced tutorials as you can
    • bring your project data, analysis, etc. to session 2


Mystery question 3 (turn this one in to me)


What was the muddiest point in the workshop so far?